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Abstract 



We describe a unified approach to describe the kinetics of folding of pro- 
teins and RNA. The underlying conceptual basis for this framework relies on 
the notion that biomolecules are topologically frustrated due to the polymeric 
nature and due to the presence of conflicting energies. As a result the free 
energy surface that has, in addition to the native basin of attraction (NBA), 
several competing basins of attraction. A rough free energy surface results in 
direct and indirect pathways to the NBA, i.e., a kinetic partitioning mech- 
anism (KPM). The KPM leads to a foldability principle according to which 
fast folding sequences are characterized by the folding transition temperature 
Tp being close to the collapse transition temperature Tg, at which a transition 
from the random coil to the compact structure takes place. Biomolecules with 
Tq ~ Tp, such as small proteins and tRNAs, are expected to fold rapidly with 
two-state kinetics. Estimates for the multiple time scales in KPM are also 
given. We show that experiments on proteins and RNA can be understood 
semi-quantitatively in terms of the kinetic partitioning mechanism. 

Introduction 

The pioneering experiments of Anfinsen and coworkers [|[] in the early sixties showed that 
the folding of proteins into a unique structure with a well defined three-dimensional topology 
is a self-assembly process, that is, the information needed to reach the native conformation 
is encoded in the primary sequence itself. Although these experiments were instrumental in 
demonstrating the possibility that the native conformation of proteins corresponds to the 
global free energy minimum they did not address the question of how the native state is 
reached in a biologically relevant time scale, ts- The question of kinetic accessibility of the 
native conformation came into focus when Levinthal argued that the time for random 
search of all available conformations far exceeds t^. This seemed to imply that in order for 
proteins to reach the native conformation on the time scale, ts, there has to be preferred 
pathways that will essentially limit the search of the conformational space. The Levinthal 
paradox, simplistic as it is, has served as an intellectual impetus to understand how proteins 
find their native conformations in a relatively short time. 
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In the last several years through a combination of sophisticated experiments P-[TO[] and 
a study of minimal models of proteins [[Til - 20 1 a new framework for understanding folding 
kinetics has emerged. Recently several reviews summarizing various aspects of the theoretical 
framework have appeared pll-[23|l. In this article we describe a complementary but a different 
perspective on how biomolecules (both proteins and RNA) reach their native conformations 
under folding conditions. The basic idea behind all theoretical studies is that the underlying 
topography of the free energy landscape of biomolecules is rugged pT| , pB| , consisting of many 



minima separated by barriers of varying heights. It should be stressed that the notion of 
a complicated free energy surface (FES) has been invoked in a variety of contexts. The 
rugged nature of potential energy surface was introduced to understand slow dynamics in 



structural glasses nearly thirty years ago pj]. More recently, extensive investigations have 



revealed that the hallmark of several classes of disordered systems is that the energy 
surface is complex, which in turns leads to activated scaling and non-Arrhenius temperature 
dependence of transport coefficients. Natural proteins are unique in the sense that despite 
having a complicated FES there is a dominant basin of attraction that is accessible on the 
time scale ts- Therefore, the challenge is to understand how a polypeptide chain explores the 
complex FES in order to reach the global free energy minimum. A natural, but tautological, 
answer is that biomolecular sequences have evolved to fold rapidly. The major contribution 
of theoretical studies is to show how the rapid assembly of proteins and RNA takes place by 
examining the kinetic processes that are involved in the exploration of the complicated FES. 
In addition, spurred in part by the theoretical developments new experimental techniques 
have been elaborated to provide details (almost at the atomic level) of the events in the 
folding process ranging from the tens of nanoseconds to submillisecond time scales. The 
joint efforts of the theoreticians and experimental community are leading to rapid progress 
in our ability to describe, in some instances quantitatively, the kinetics of biomolecular 
self-assembly. 



In a recent article |]2B| we have suggested that concepts developed in the context of 
protein folding may be used to understand the kinetics of RNA folding. The purpose of this 
review is to describe a unified approach that leads to uncover the global features that are 
expected in the folding kinetics of proteins and RNA. We also provide comparisons between 
theoretical predictions and experimental measurements. Our goal is to point out that one 
can describe in some detail the general kinetic principles of folding of biomolecules from the 
statistical mechanics perspective. Furthermore, we will show that the expressions for the 
various time constants can be estimated (at least, to within an order of magnitude) in terms 
of experimental parameters leading to validation of the theoretical concepts. 

Topological Frustration and the Kinetic Partitioning Mechanism 

Most of the qualitative features of the folding kinetics of biomolecules can be understood 
by introducing the notion of topological frustration. A crucial feature of proteins is that 
the primary sequence contains a certain fraction of monomers that are hydrophobic. The 
fraction of hydrophobic residues in globular proteins is in slight excess of 0.5 The linear 



density of the hydrophobic residues is roughly uniform along the contour of the polypeptide 
chain which means that the hydrophobic residues are dispersed through out the chain. If this 
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were not the case, proteins would tend to aggregate. A consequence of the uniform density 
of hydrophobic residues is that on an any length scale /, which is not equal to the size of 
the chain, there is a propensity for the hydrophobic residues to form tertiary contacts under 
folding conditions. The resulting structures, formed by having contacts between proximal 
hydrophobic residues, would in all likelihood be incompatible with the unique global fold. 
The incompatibility of the structures on local length scales with the native conformation 
leads to topological frustration. 

It is important to appreciate that topological frustration is an inherent consequence 
of the polymeric nature of proteins (connectivity of residues) as well as the presence of 
competing interactions (hydrophobic species, which prefer to form compact structures, and 
hydrophilic residues, which are better accommodated by extended conformations). Thus, all 
proteins are topologically frustrated, with long chains being more so than smaller ones. A 
direct consequence of topological frustration in proteins is that the underlying topography 
of the free energy surface is complex, consisting of many minima that are separated by a 
distribution of barriers. We have recently shown that similar considerations apply to 
RNA as well. 

The nature of the low lying minima is easy to describe qualitatively. On any length scale 
/ there are numerous ways of constructing structures that are in conflict with global fold. 
Many, perhaps most, of these structures would have high free energies and be consequently 
unstable to thermal fluctuations. However, certain of these structures are truly stable low 
energy minima that represent the conformations that can have many structural features in 
common with the native state. The energetic difference between these low energy misfolded 
structures and the native state can be easily compensated by the entropy associated with 
the misfolded structures. Thus, such structures potentially act as kinetic traps in a typical 
folding experiment and slow down the rate of protein folding. 

The fundamentals of the kinetic partitioning mechanism (KPM) can be deciphered from 
the concept that there are low energy minima (in which the proteins are misfolded) separated 
by free energy barriers from the native state. Imagine an ensemble of denatured molecules 
under folding conditions (achieved by diluting the concentration of the denaturant, for ex- 
ample) in search of the deepest basin of attraction in the rugged free energy landscape. A 
fraction of the molecules $ would map onto the native basin of attraction (NBA) directly 
and would reach it rapidly without being trapped in other states. The remaining fraction 
would inevitably be stuck in one or several of the low energy misfolded structures, and only 
on a longer time scale reach the native state by suitable activation processes. Thus, because 
of the topological frustration that gives rise to a rugged free energy landscape, the pool of 
denatured molecules partitions into fast folders and slow folders that reach the native state 
by indirect slow off-pathway processes. 

A schematic sketch of the KPM is shown in Fig. (1). The outline of the kinetic scheme 
that emerges from Fig. (1) can be conveniently written as 



u 



k2 \ / h 
{MS} 



where U refers to the unfolded states, F is the folded state, and {MS} denotes the 
collection of the low energy misfolded states. For simplicity we have not indicated the rates 
for the backward processes in above kinetic scheme. 

The yield of the fast process is given by the partition factor. Since the shape and 
structure of the underlying FES determine $, it becomes apparent that $ depends not only 
on the factors intrinsic to sequences but also on external conditions such as pH, temperature 
and ionic strength. Furthermore, $ can be easily altered by mutations so that a wild type 
protein with $ ^ 1 can be made into a slow or moderate folder. In the subsequent sections 
we will discuss in some detail the dominant time scales that arise in the KPM. We will 
also show that various experiments can be at least qualitatively understood in terms of the 
kinetic partitioning mechanism. 



Experimental Evidence for KPM 



There are several recent experimental studies which support the basic ideas of the kinetic 
partitioning mechanism [fTj^. These experiments suggest that foldable proteins can be 
divided into two classes; first are fast folders which reach the native conformation in a two 
state kinetic process without being trapped in any intermediates. Typically fast folding 
proteins are relatively small. From our theoretical perspective one would conclude that the 
a values (see Eq. ([|) below) for these are small and the underlying energy landscape is 
dominated by a single basin of attraction corresponding to the native conformation. The 
time constant for reaching the native conformation is of the order of a few milliseconds, 
which is consistent with the theoretical prediction given in Eq. (^. These proteins have the 
partition factor, $, that is quite close to unity. 

The other class of proteins is the moderate folders which exhibit multiple kinetic phases 
predicted by the KPM. The most detailed verification of KPM has come in the study of the 
refolding of lysozyme. In important early experiments Radford et al 17,28] observed that the 
protection kinetics in hydrogen exchange labeling experiments is well described by biphasic 
kinetics. If we follow the interpretation suggested by Thirumalai and Guo |29| that the 
fast phase in these experiments describes the mechanism of refolding to the native state by 
the nucleation-collapse process, then the corresponding amplitude gives an estimate of the 
partition factor. With this interpretation the experiments of Radford et al |]7|J2^ suggest 
that $ ~ 0.25 at T = 20°C, pll = 5.2 and at the concentration of guanidinium chloride 
(GdmCl) of 0.54 M. 

More recently, Kiefhaber |^ has performed an ingenious experiment that very clearly 
shows the full range of kinetic behavior predicted by KPM. He used interrupted refolding 
experiments on hen egg lysozyme to directly measure the value of $. The refolding exper- 
iments are initiated by diluting completely unfolded lysozyme into final folding conditions 
(0.6 M GdmCl, pH = 5.2, and T = 20°C). At various times the folding is interrupted by 
transferring the solution to 5.3 M GdmCl and pH= 1.8. Apparently, under these conditions 
the native lysozyme unfolds completely in about 20 s, while any partially formed misfolded 
structures unravel in a few milliseconds. Thus, the amplitude of the slow unfolding process 
gives the amount of the native structure when the folding process is interrupted. By varying 
the time of interruption a history of the kinetics of formation of the native state can be 
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constructed. Kiefhaber shows, by analyzing the folding kinetics at 0.6 M GdmCl, that the 
partition factor $ is 0.14, which implies that 14% of the initial population of denatured 
lysozyme reach the native state in 50 ms. The majority of the population gets to the native 
conformation by indirect processes that involves transitions out of the kinetic traps. As 
shown below the time constant for this process using Eq. is about 100 ms, which is in 
rough agreement with experimental estimate of 420 ms Q . 



Dominant Time Scales in KPM 



The basic ideas leading to the KPM which are described above have been substantiated 
using computer simulations of simplified models of proteins p6| , |3lH 33[] . The dependence 
of the various parameters characterizing KPM, namely, $, ki,k2,k3 etc. on the properties 
intrinsic to the sequence have been identified [Q. In particular, it has been shown in a 
series of papers that for given external conditions the kinetic parameters of KPM are largely 
determined by equilibrium temperatures that are intrinsic to the protein sequence p2| , |33[ . 



It is now known that for foldable sequences there are at least two equilibrium temperatures 
that determine the "phases" of proteins [^,35]. One is the high temperature Tg, at which 
the chain undergoes a transition from a random coil conformation to a collapsed state; Tq 
is very similar to the collapse transition temperature introduced by Flory to describe the so 
called ^-point in homopolymers. Since there are several distinct energy scales in proteins 
that discriminate between the exponentially large number of compact conformations, the 
chain undergoes a folding transition to the native state at a temperature Tp ^ Tg. The 
transition at Tp is usually first order [^, while the one at Tg can be either first or second 
order depending upon a number of factors |^ , such as the relative strength of effective two 
and three body interactions in the polypeptide chain. Both Tg and Tp are experimentally 
measurable. The folding transition temperature, Tp, is usually associated with the midpoint 
of the temperature dependence of denaturation plots. The collapse temperature is somewhat 
harder to measure experimentally. Recently Tg has been experimentally determined for few 



proteins ||37| , |38 



Using lattice models and Monte Carlo simulations and off-lattice models and Langevin 
simulations we have established that folding times, Tp, correlate remarkably well with a 
single parameter that is intrinsic to the sequence P^,P5|,|5B|, 



, namely 



a={Tg-Tp)/Tg 



(1) 



Therefore, rp can be varied by altering Tg and Tp, both of which depend not only on the 
sequence but also on the external conditions. It is clear that Tp depends precisely on the 
sequence and hence can be altered by mutations. Surprisingly, Tg also depends on the 
sequence |Q (less sensitively than Tp). The reason that Tg depends on the sequence and 
not just on the sequence composition is that in addition to the hydrophobic interactions, 
due to the finite size of proteins the interfacial interactions between the surface residues and 
the solvent make a large contribution in determining the precise topology. The combination 
of hydrophobic and interfacial interactions determines Tg resulting in the collapse transition 
temperature being sequence dependent. 
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The correlation between folding times and a suggests that the rates of folding can be 
altered by changing a while leaving external conditions fixed. Thus the wild type protein and 
a mutated one can have very different folding rates depending upon a. This has also been 
observed in RNA, in which the folding behavior can be drastically altered by a single point 
mutation |P0| , ^ . Using the concepts of polymer theory one of us has shown that the 



time scales characteristic of KPM can be established in terms of a and other experimentally 
controlled parameters. Remarkably enough the dominant time scales in the folding process 
are once again controlled by a. 

Fast processes and native conformation nucleation-collapse mechanism 

The fast processes, by which a certain fraction $ of the initial ensemble of denatured 
molecules reaches the native state, have been shown to occur via a native conformation 
nucleation-collapse (NCNC) mechanism pU| , ^^ . According to NCNC folding is initiated 



by the formation of native tertiary contacts. Once a critical number of residues form tertiary 
native contacts, establishing an overall near-native topology in the transition state, the 
polypeptide chain rapidly reaches the native state. In this mechanism the processes of 



collapse and the acquisition of the native state are almost synchronous [^, and hence 



would be nearly indistinguishable. The time scale for NCNC has been argued to be E2 



r^CTVc ^ - /(a) iV^ (2) 

7 

where r] is the solvent viscosity, a is roughly the persistent length of the protein, 7 is the 
surface tension, which tends to minimize the exposed surface area of the hydrophobic species, 
and is the number of amino acid residues in the protein. The exponent uj lies in the range 
3.8 < oj < 4.2. The function /(ex) was originally shown to be algebraic in a. Numerical 
studies indicate that /(a) ~ exp((T/(To) l3^j3^ provides a better fit to the folding times. 
There are a number of remarks concerning NCNC and t^cnc that are worth making: 



If (J is small, which implies that collapse and folding are almost synchronous, then 
$ 1 and the folding time coincides with t^cnc- Typically this is only expected for 
small proteins under optimal folding conditions. Under these circumstances folding 
kinetics is expected to display two-state behavior. Several experiments suggest that 
small proteins exhibit the predicted two state behavior [47,4^. 



From a theoretical perspective sequences with small a are extremely well optimized 
so that the simultaneous requirements of thermodynamic stability and the kinetic 
accessibility of the native state can be achieved over a relatively large temperature 
range. Numerical estimates PSl suggest that for these sequences Tp t^cnc indeed 



scales algebraically with A^ with u ^ 4, confirming the theoretical predictions |42 



In a recent experiment Schonbrunner et al |^ have suggested that for the 74-residue 
all /3-sheet forming protein tendamistat, collapse and folding are essentially indistin- 
guishable. Using the experimental parameters for 7] and an estimate for 7 and a the 
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calculated folding time according to Eq. is about 7 ms, which is remarkably close 
to the measured value of 10 ms |4^. This estimate also suggests that experiments in 
the submillisecond regime are required to directly observe the nucleation dominated 
processes. 



Three Stage Multipathway Mechanism 

In the case of moderate and slow folders it is likely that a large fraction of initially dena- 
tured molecules does not reach the native state by the NCNC mechanism described above. 
Such is the case in the refolding of hen egg lysozyme, for example [^. The partition factor 
$ under these circumstances is small. We now describe the approach to the native state 
of the pool of molecules that follow the indirect off-pathway processes. Extensive numeri- 
cal studies of lattice and off-lattice models show that the formation of the native structure 
by indirect pathways can be conveniently described by three stage multipathway kinetics 
p8| , |35[| . A brief description of each of the stages along with estimates of the associated time 
scale is given below. 

(1) Non-specific collapse: After the folding process is initiated the polypeptide chain 
collapses into a relatively compact phase due to the hydrophobic driving force. The kinetics 
in this stage is quite complex and can perhaps be described by a distribution of time scales 
leading to a stretched exponential behavior ||3^. It should be stressed that in contrast to 



homopolymer collapse the initiation of collapse in proteins is not completely random. The 
possible structures that are seen in this phase could depend on loop formation probability, 
dihedral angle transitions etc. It is likely that certain secondary structure elements such as 
helices, which form rapidly, are already present at this stage. By a small generalization of 
the arguments presented by de Gennes |^ the time scale for non-specific collapse can be 
written as 

An estimate for can be made by taking rj ^ 0.01 Poise, a ~ (5 — 10) A and 7 ~ 
(40 - QO)cal/{A'^mol). With Tp ~ Te/2, is found to be between (0.02 - 3) /xs for 
= 100. This time scale roughly coincides with the time for forming small number of 
contacts (see Eq. (|)) between residues that are far apart in the sequence space. 
(2) Kinetic Ordering: In this stage the folding chain discriminates between the exponen- 
tially large number of compact conformations to form as many native contacts as possible 
which would result in a lower free energy. In this phase free energy biases inherent in foldable 
sequences become operative and the chain navigates to the CBA by a very cooperative mo- 
tion. At the end of this stage the chain reaches one of the low energy misfolded structures 
which have many elements in common with the native conformation. The search among 
these large number of compact structures leading to the low energy misfolded conformations 
has been argued to proceed by a reptation like process with the time constant |^ 



Tko-TDN^ (4) 
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with ( ^ 3. The time constant td corresponds to a local dihedral angle transition and is 
approximately 10~^ s. Thus, t^o ~ 10 ms for N = 100. 

(3) AIl-or-None: The last stage in the off-pathway processes involves activated transition 
from one of the many minimum energy structures to the native state. This process nec- 
essarily involves unraveling of the chain (at least partially) in order to break the incorrect 
contacts and subsequently form the native contacts. The partial unraveling of the chain in 
the process of transition to the native state has been observed in numerical simulations of 



minimal models of proteins p!5| , p5| . It has been argued that the average free energy bar- 
rier separating the misfolded structures and the native state scales as A/iV/csTp under 
certain optimal folding conditions, so that the folding time for the slow process is 

TF ^ To (5) 



at T ~ Tp. Numerical simulations and more recently experiments [0] suggest that tq ~ 
10"^ s so that tf for N = 100 is 0.1 s. 

Since the barrier height scales only sublinearly with it is clear that foldable sequences 
do not encounter the Levinthal paradox under folding conditions even if they fold by indirect 
pathways. There are multiple pathways till the second stage whereas only relatively few 
pathways connect the misfolded structures and the native state. This is because, as suggested 
elsewhere, the number of low energy compact structures only scales as InN . It is also 
clear that if the molecules follow the three stage kinetic approach to the native state then 
the transition states occur closer to the native conformation. 



Foldability Principle 

There are now numerous examples of proteins that reach the native state in few tens of 
milliseconds under optimal folding conditions PB|-^ . The theoretical reasoning given above 



indicates that under these conditions the value of a for these proteins is relatively small. 
These observations suggest a foldability principle which can be stated as follows: A sequence 
is rapidly foldable ifTg ^ Tf- By foldability we mean that both the kinetic accessibility and 
thermodynamic stability are simultaneously satisfied. The foldability principle naturally 
applies to small single domain proteins whose sequences can be optimized relatively easily. 

It is, in fact, tempting to suggest that the foldability principle, which express the kinetic 
accessibility criterion in terms of properties intrinsic to the sequence, is a quantitative realiza- 
tion of the consistency principle |52| and the principle of minimal frustration |Q . Sometime 



ago Go [^] realized that spontaneous folding of proteins requires that the interactions on 
short range, which are responsible for secondary structure formation, be compatible with 
long range interactions, which confer global topology. Here short and long refer to distances 
along the sequence. Go also realized that it is not possible in nature to produce an "ideal 
protein" , in which there is a complete harmony between long and short range interactions. 
More recently, Bryngelson and Wolynes [ESI suggested, using the random energy model R 



as a paradigm for protein folding, that the conflicts between various energy scales should be 
minimized. 

If we take these principles into account we can argue that the minimization of a should 
be a natural criterion for achieving a nearly ideal protein sequence. A heuristic argument 
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leading to this conclusion goes as follows: The collapse transition temperature Tq is primarily 
determined by a combination of the driving force that tends to bury the hydrophobic residues 
in the core of the protein and the forces that place the hydrophilic residues at the surface. 
The free energy scale D determining Tq is obtained by a balance between the hydrophobic 
interactions and the interfacial energies that tend to place hydrophilic residues at the surface 



to create a nearly compact structure. Thus, fcsTe D. We have recently argued that 
Tp can be approximated as 

(6) 



s 



NN 



where is roughly the stability gap and Snn is the entropy associated with the low energy 
non-native states. It is reasonable to assume that Snn also depends on D. If the driving 
force is very large, then D/kBT » 1, and the polypeptide chain will collapse into one of 
an exponentially large number of conformations. Since is only weakly dependent on 
^ it follows that for large D, Tp ^ 0, which is the homopolymer limit. In the opposite 



limit D/ksT « 1 there is not enough driving force for collapse and S^n once again grows 
with (excluding logarithmic corrections) leading to small Tp. Sufficiently high value of 
Tp is obtained only when S^^ is small or the number of low energy non-native structures 
is not too big. Thus an optimum value of D is required so that S^n remains small. The 
existence of optimal D is also consistent with the observation that in natural proteins the 
fraction of hydrophobic residues is in slight excess of 0.5 P7|J^ . If we use the bound that 



Tp ^ Tg then we see that the optimal value for D results for Tp ^ Tg. An optimal value 
of D implies a proper balance between long range and short range interactions so that the 
hydrophobic interactions are in harmony with interfacial forces. Thus, at least heuristically 



we can conclude that the consistency principle and the principle of minimal frustration 



|53| suggest that a should be small for optimally designed proteins. Since minimizing a seems 



probable only for small proteins it is tempting to suggest that in nature bigger proteins are 
not optimized. 

Early Events in Protein Folding 

The time scale estimate for tncnc for proteins that reach the native state by the 
nucleation-coUapse folding process with $ 1 is roughly between 0.1ms to few tens of 
milliseconds depending on the length of the protein and external conditions (see Eq.(^). 
In addition, the time scale (Eq. (^) for reaching the low energy misfolded structures by 
off-pathway processes is also about few milliseconds. It is, therefore, of interest to ask about 
processes that take place on a submillisecond time scale. Following the pioneering work of 
Eaton and coworkers ||I0| , |55| , there has been an explosion of experimental papers [p^-|59 



probing protein folding events on short time scales using a variety of techniques. In the 
original experiments Jones et al [|T^] used optical triggering to refold cytochrome C in a 
chemical denaturant. More recently temperature jump ||5^, electron transfer [^, and other 



novel mixing techniques have been used to induce and observe protein folding in submil- 
lisecond time scale. The major conclusion of almost all these experiments is that significant 
self-assembly of proteins begins on time scales as small as a microsecond. 
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From a general perspective a question of some importance is whether there is an upper 
hmit for the rate of protein folding. In fact, in a recent interesting article explaining the 
so called "new view" of protein folding Dill and Chan [^] put down a "wish list" for 



experimental studies of folding kinetics. One of the questions in the wish list is "What is 
the fastest speed a protein can fold?" . Hagen et. al. have recently attempted to provide 
an imaginative answer to this question by using the following reasoning. They conducted 
an experiment to probe the time it takes for two residues that are far apart in sequence 
space to form a contact. Such a transient contact may either be native (i.e., the contact is 
present in the final folded conformation) or non-native. Using optical triggering to refold 



cytochrome C Hagen et. al. |^0[ estimated that the diffusion controlled rate for two sites 
separated by ~ 50 — 60 residues to make a contact would be about (100/is)~^ Using the 
loop formation probability for stiff chains [6^] they argued that the more probable contacts 



between sites separated by 10 or 20 residues can occur in about 1 /is. Since the formation of 
a single tertiary contact is the most elementary folding process (besides, say, the formation 



of secondary structure like a helix) in the route towards the global fold Hagen et. al. ||50 
argued that the upper limit for folding rate of a protein should be 1 {fis)~^. 

These experimental estimates that 1 fis should be an important time scale in the initiation 
of certain events in the folding process is consistent with theoretical estimates of the time 
scales t{1) for diffusion limited contact formation between two sites that are separated by / 



residues. Guo and Thirumalai ||6T| showed using scaling arguments that r(Z) can be estimated 

as 

where < Rf > is the spatial distance separating the two sites, P{1) is the probability of 



loop formation and Dq is the effective monomer diffusion constant. The distribution 
function P{1) can be calculated by assuming that the backbone is stiff on the scale of the 
order of persistent length and is given by 0] 

P{l)^niN){ l-exp{-i,) (8) 



««3 



''mm 



where /^m is length of the shortest loop possible, 6*3 is an universal exponent whose value in 
three dimensions is 2.2, and Ip is an effective persistent length which measures the stiffness 
of the polypeptide chain backbone. The normalization factor Q{N) depends on the total 
number of residues. If we use the Flory results for < Rf >Ri Ip^'^, then the time constant ti 
for / > Ip becomes 

/2 J2u+e3 

n - m)^j^- (9) 

If we take the experimental estimates for Dq ^ 10~^ cm^/s , /p SA, z/ ~ 0.6 then r; 
for / = 10 according to Eq. (y) turns out to be about 10 fis, where Q{N) is computed 
using Imin — 7 and N ^ 00. This theoretical estimate is consistent with the experimental 
measurements given the inherent uncertainties in Dq and Ip. 
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In retrospect it is not surprising that 1 /is or so turns out to be an important time 
scale in the early processes of protein folding. This was already realized based on theoret- 
ical arguments given in the context of the refolding of bovine pancreatic trypsin inhibitor 
Il60| , |62[| . However, an understanding of how the formation of these contacts leads to further 
self-assembly of proteins is still lacking. This requires experiments that can directly ob- 
serve correlated events. It is only through such experiments that a molecular basis for the 
nucleation-collapse mechanism can be provided. 

Experimental |Q and theoretical ||61| studies clearly suggest that transient long range 



contacts occur on some microsecond time scale. If these contacts are non-native and sta- 
ble then the polypeptide chain will subsequently collapse into a misfolded structure on a 
short (much less than about milliseconds) time scale. In proteins with small a the initial 
stable contacts are expected to be native, and once a sufficient number of such contacts 
forms then a rapid transition to the native state takes place presumably via the nucleation- 
collapse mechanism. As suggested above experiments that can observe correlation between 
multiple contacts will be required to further elucidate the nature of the nucleation-collapse 
mechanism. 

Folding of RNA 

The folding pathways of large RNA are now beginning to be probed experimentally. In 
the last year the existence of possible connections between the folding of RNA and proteins 
was pointed out |^^. From the perspective of the rough FES it is natural to expect 
some common qualitative elements for folding kinetics of proteins and RNA O^. The usual 



arguments like the incompatibility of times for random search of all conformations and 
biological folding times apply equally well to nucleotide sequences. For RNA it is necessary 
to form correct secondary structures, namely Watson-Crick pairs between complementary 
sequences. The correctly formed secondary structures assemble to achieve the correct three- 
dimensional organization of the structural elements. Although we expect certain common 
trends in the folding of RNA and proteins there are also fundamental differences that 
have not been explored. One major difference is that collapse of RNA typically requires 
binding of divalent ions |]6^ , |65 |. 



As for proteins, it is found that certain RNA sequences fold rapidly without being trapped 



in misfolded states. These include tRNAs [36| as well as certain small group I self-splicing 



introns [^] for which $ under in vitro conditions appears to be close to unity. The folding 
time for tRNA was estimated to be about 0.1 — Is [^] suggesting that perhaps folding 
occurs via a nucleation-collapse mechanism. 



In a recent article |^ we have begun to analyze in quantitative terms the folding kinetics 
of Tetrahymena ribozyme in terms of the kinetic partitioning mechanism. The availability of 
considerable structural information makes the Tetrahymena ribozyme an attractive model 
for studying the folding of large RNAs The arguments given for topological frustration 



suggest that the low energy misfolded structures become more prominent for larger length 
chains resulting in a smaller value of $ . 



These expectations are borne out in the quantitative analysis of the experiments on 
the refolding of precursor RNA containing the Tetrahymena ribozyme. The experiments of 
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Emerick and Woodson ||6^j6^ showed using self-splicing kinetics and gel electrophoresis that 
a population containing a mixture of active and inactive conformers is in slow exchange at 
T = 30°C. Majority of the population (> 70%) of the wild-type Tetrahymena precursor 
RNA appears to be misfolded after transcription at T = 30°C If the RNA is heated to 
T = 75°C and annealed to T = 30°C the percentage of inactive molecules decreases to about 
20 — 30 %. Thus the inactive conformations (presumably misfolded) can be made to reach 
the native state by an annealing process. 

The experimental findings of Emerick and Woodson ||68|j69|| confirm the basic picture of 
folding predicted by KPM. According to KPM for larger RNAs one expects the chain to be 
trapped in one of the low energy structures. The relatively slow folding is a consequence 
of escape from these traps by an activated process. The partition factor at T = 30°C is 
small ($ ~ 0.2) which implies that most of the molecules reach the native conformation by 
off-pathway processes. 

Another theoretical prediction made in the context of proteins that appears to be consis- 
tent with experiments on RNA is the activation energy separating the misfolded states and 
the active folded conformation. Based on the temperature dependence of the conversion of 



the inactive to active form the barrier height was estimated to be (10 — 15) kcal/mol |]68| , |69 



According to the theoretical arguments such barriers are expected to scale as yNksT H2 
which for Tetrahymena precursor RNA {N = 650) turns out to be 15 kcal/mol. The good 
agreement between theoretical estimate and experiments suggests that typical free energy 
barriers in biomolecules are small. 



Conclusions 



It is gratifying that certain general principles of folding kinetics of proteins and RNA 
can be deciphered from simple considerations p6[. Due to their polymeric nature and the 
presence of conflicting energy scales, biomolecules are intrinsically topologically frustrated. 
As a consequence the free energy surface is complex and contains not only the native basin 
of attraction (NBA), but also competing basins of attraction (CBA) as well. The basic 
features of the kinetic partitioning mechanism (KPM) naturally emerge from this idea. 
The concepts outlined in this article should be viewed as a tentative unifled proposal to 
conveniently classify the possible scenarios that can arise in the complex self-assembly of 
proteins and RNA. It is possible to extend these concepts to make testable predictions for 
the folding of speciflc proteins [^| and RNA, assuming more detailed models that account 
for solvent conditions and other aspects that are left out in the simplifled description. It is 
nonetheless clear that our understanding of the folding kinetics will continue to grow rapidly 
through an interplay between theoretical ideas and experimental advances. 

A unifled description of the folding process of proteins and RNA is expected to advance 
the study of RNA folding. For example, it is logical to suggest that fast processes in RNA 
could well determine the extent to which misfolded structures are going to slow down the 
folding process. If the similarities between the nature of folding of proteins and RNA are 
further pursued then it would imply that the organization of the folded structure of RNA also 
involves parallel pathways ||7^ . The recent semi- quantitative analysis ||26| of the experiments 
of Emerick and Woodson pH] strongly suggests that folding of large RNAs does take place 
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by multiple parallel pathways. Additional experiments on faster time scale are needed to 
further elucidate the nature of these pathways. 

Finally, the KPM also points to the need for chaperonin assisted folding of proteins and 
RNAs p6| . The arguments based on KPM would suggest that only when the partition factor 
is small (less than 10 %) does one require the chaperonin machinery p6| , [7T 
suggested here and elsewhere 



26[| , this happens only for large proteins. 



Typically, as 
For these, N is 

sufficiently large so that the folding time given by Eq. not only exceeds ts but starts to 
become comparable with the time scale for aggregation processes. Under these circumstances 
the chaperones are predicted to rescue the misfolded structures by a process referred to as 
iterative annealing mechanism [^,|7^]. A similar reasoning would suggest that for large 
RNA as well there must exist RNA chaperones which presumably function in a manner 
similar to GroEL and GroES. The RNA chaperones have not yet identified, although certain 
non-binding RNA proteins have been shown to enhance the rate of RNA-catalyzed reactions 
in vitro |71,|71. 
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FIGURE CAPTION 



Fig. 1. A pictorial representation of the kinetic partition mechanism. The unfolded 
structures collapse rapidly (in, perhaps, microsecond time scales). These structures con- 
tain almost all of the secondary structures. These structural elements are shown as blocks 
labeled as A - G. Some of these blocks show hehces, which are expected to form in sub- 
microsecond time scales, while other blocks show beta-strands, which form in about ten 
microseconds. The subsequent packing of these secondary structural elements results in a 
fraction of the population $ going directly to the native state via the native conformation 
nucleation-coUapse mechanism. An example of a transition state obtained along this path- 
way is displayed as an expanded version of the native conformation. This structure contains 
all native-like contacts and the lack of native contacts between blocks A and G and B and F 
makes this structure somewhat larger than the native state. The remaining fraction of the 
molecules, 1 — $, gets trapped in misfolded structures, an example of which is shown in the 
upper right corner. In this case helices A and G have incorrect orientation and non-native 
contact between B and E has been formed. The activated transitions from the misfolded 
structures to the native state involve partial unraveling of the polypeptide chain to break 
the incorrect contacts and establish native contacts. In this highly simplified representation 
hydrophobic portions of sequence are shown in blue and the hydrophilic are given in red. 
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This figure "figure.gif" is available in "gif" format from: 



http://arXiv.org/ps/cond-mat/9704067v2 



